Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-Induced Corpus Clean-up

We explore the feasibility of using only unsupervised means to identify non-words, i.e. typos, in a frequency list derived from a large corpus of Dutch and to distinguish between these non-words and real-words in the language. We call the system we built and evaluate in this paper CICCL, which stands for ‘Corpus-Induced Corpus Clean-up’. The algorithm on which CICCL is primarily based is the an...

متن کامل

Ukwabelana - An open-source morphological Zulu corpus

Zulu is an indigenous language of South Africa, and one of the eleven official languages of that country. It is spoken by about 11 million speakers. Although it is similar in size to some Western languages, e.g. Swedish, it is considerably under-resourced. This paper presents a new open-source morphological corpus for Zulu named Ukwabelana corpus. We describe the agglutinating morphology of Zul...

متن کامل

A Large Semantic Lexicon for Corpus Annotation

Semantic lexical resources play an important part in both corpus linguistics and NLP. Over the past 14 years, a large semantic lexical resource has been built at Lancaster University. Different from other major semantic lexicons in existence, such as WordNet, EuroWordNet and HowNet, etc., in which lexemes are clustered and linked via the relationship between word/MWE senses or definitions of me...

متن کامل

Automatic Lexicon Enhancement by Means of Corpus Tagging

Using specialised text corpus to automatically enhance a general lexicon is the aim of this study. Indeed, having lexicons which offer maximal cover on a specific topic is an important benefit in many applications of Automatic Speech and Natural Language Processing. The enhancement of these lexicons can be made automatic as big corpora of specialised texts are available. A syntactic tagging pro...

متن کامل

Morphological Annotation of the Lithuanian Corpus

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: South African Journal of African Languages

سال: 2011

ISSN: 0257-2117,2305-1159

DOI: 10.1080/02572117.2019.12063275